Điện Biên Province
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Damani, Mehul, Puri, Isha, Slocum, Stewart, Shenfeld, Idan, Choshen, Leshem, Kim, Yoon, Andreas, Jacob
When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-confidence outputs, they often have the unintended side-effect of degrading calibration and increasing the rate at which LMs generate incorrect responses (or "hallucinate") in other problem domains. This paper describes RLCR (Reinforcement Learning with Calibration Rewards), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation. During RLCR, LMs generate both predictions and numerical confidence estimates after reasoning. They are trained to optimize a reward function that augments a binary correctness score with a Brier score -- a scoring rule for confidence estimates that incentivizes calibrated prediction. We first prove that this reward function (or any analogous reward function that uses a bounded, proper scoring rule) yields models whose predictions are both accurate and well-calibrated. We next show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy, on both in-domain and out-of-domain evaluations -- outperforming both ordinary RL training and classifiers trained to assign post-hoc confidence scores. While ordinary RL hurts calibration, RLCR improves it. Finally, we demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration via confidence-weighted scaling methods. Our results show that explicitly optimizing for calibration can produce more generally reliable reasoning models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Heart Rate Classification in ECG Signals Using Machine Learning and Deep Learning
This study addresses the classification of heartbeats from ECG signals through two distinct approaches: traditional machine learning utilizing hand-crafted features and deep learning via transformed images of ECG beats. The dataset underwent preprocessing steps, including downsampling, filtering, and normalization, to ensure consistency and relevance for subsequent analysis. In the first approach, features such as heart rate variability (HRV), mean, variance, and RR intervals were extracted to train various classifiers, including SVM, Random Forest, AdaBoost, LSTM, Bi-directional LSTM, and LightGBM. The second approach involved transforming ECG signals into images using Gramian Angular Field (GAF), Markov Transition Field (MTF), and Recurrence Plots (RP), with these images subsequently classified using CNN architectures like VGG and Inception. Experimental results demonstrate that the LightGBM model achieved the highest performance, with an accuracy of 99% and an F1 score of 0.94, outperforming the image-based CNN approach (F1 score of 0.85). Models such as SVM and AdaBoost yielded significantly lower scores, indicating limited suitability for this task. The findings underscore the superior ability of hand-crafted features to capture temporal and morphological variations in ECG signals compared to image-based representations of individual beats. Future investigations may benefit from incorporating multi-lead ECG signals and temporal dependencies across successive beats to enhance classification accuracy further.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
- Asia > Vietnam > Điện Biên Province > Điện Biên Phủ (0.04)
Past Meets Present: Creating Historical Analogy with Large Language Models
Li, Nianqi, Yuan, Siyu, Chen, Jiangjie, Liang, Jiaqing, Wei, Feng, Liang, Zujie, Yang, Deqing, Xiao, Yanghua
Historical analogies, which compare known past events with contemporary but unfamiliar events, are important abilities that help people make decisions and understand the world. However, research in applied history suggests that people have difficulty finding appropriate analogies. And previous studies in the AI community have also overlooked historical analogies. To fill this gap, in this paper, we focus on the historical analogy acquisition task, which aims to acquire analogous historical events for a given event. We explore retrieval and generation methods for acquiring historical analogies based on different large language models (LLMs). Furthermore, we propose a self-reflection method to mitigate hallucinations and stereotypes when LLMs generate historical analogies. Through human evaluations and our specially designed automatic multi-dimensional assessment, we find that LLMs generally have a good potential for historical analogies. And the performance of the models can be further improved by using our self-reflection method.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Iraq (0.04)
- Asia > China > Hong Kong (0.04)
- (15 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- (2 more...)
Learning Generalization and Regularization of Nonhomogeneous Temporal Poisson Processes
Van, Son Nguyen, Xuan, Hoai Nguyen
The Poisson process, especially the nonhomogeneous Poisson process (NHPP), is an essentially important counting process with numerous real-world applications. Up to date, almost all works in the literature have been on the estimation of NHPPs with infinite data using non-data driven binning methods. In this paper, we formulate the problem of estimation of NHPPs from finite and limited data as a learning generalization problem. We mathematically show that while binning methods are essential for the estimation of NHPPs, they pose a threat of overfitting when the amount of data is limited. We propose a framework for regularized learning of NHPPs with two new adaptive and data-driven binning methods that help to remove the ad-hoc tuning of binning parameters. Our methods are experimentally tested on synthetic and real-world datasets and the results show their effectiveness.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (7 more...)